首页> 外文OA文献 >Sparse K-Means with $\ell_{\infty}/\ell_0$ Penalty for High-Dimensional Data Clustering
【2h】

Sparse K-Means with $\ell_{\infty}/\ell_0$ Penalty for High-Dimensional Data Clustering

机译:稀疏K-means与$ \ ell _ {\ infty} / \ ell_0 $ penalty for High-Dimensional   数据聚类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Sparse clustering, which aims to find a proper partition of an extremelyhigh-dimensional data set with redundant noise features, has been attractedmore and more interests in recent years. The existing studies commonly solvethe problem in a framework of maximizing the weighted feature contributionssubject to a $\ell_2/\ell_1$ penalty. Nevertheless, this framework has twoserious drawbacks: One is that the solution of the framework unavoidablyinvolves a considerable portion of redundant noise features in many situations,and the other is that the framework neither offers intuitive explanations onwhy this framework can select relevant features nor leads to any theoreticalguarantee for feature selection consistency. In this article, we attempt to overcome those drawbacks through developing anew sparse clustering framework which uses a $\ell_{\infty}/\ell_0$ penalty.First, we introduce new concepts on optimal partitions and noise features forthe high-dimensional data clustering problems, based on which the previouslyknown framework can be intuitively explained in principle. Then, we apply thesuggested $\ell_{\infty}/\ell_0$ framework to formulate a new sparse k-meansmodel with the $\ell_{\infty}/\ell_0$ penalty ($\ell_0$-k-means for short). Wepropose an efficient iterative algorithm for solving the $\ell_0$-k-means. Todeeply understand the behavior of $\ell_0$-k-means, we prove that the solutionyielded by the $\ell_0$-k-means algorithm has feature selection consistencywhenever the data matrix is generated from a high-dimensional Gaussian mixturemodel. Finally, we provide experiments with both synthetic data and the AllenDeveloping Mouse Brain Atlas data to support that the proposed $\ell_0$-k-meansexhibits better noise feature detection capacity over the previously knownsparse k-means with the $\ell_2/\ell_1$ penalty ($\ell_1$-k-means for short).
机译:稀疏聚类旨在寻找具有冗余噪声特征的超高维数据集的适当分区,近年来受到越来越多的关注。现有研究通常在最大化加权特征贡献的框架下解决该问题,该框架受到$ \ ell_2 / \ ell_1 $惩罚。然而,该框架具有两个严重的缺点:一是该框架的解决方案在许多情况下不可避免地涉及相当多的冗余噪声特征,其二是该框架既未提供关于该框架为何可以选择相关特征也无法导致任何结果的直观解释。特征选择一致性的理论保证。在本文中,我们试图通过开发使用$ \ ell _ {\ infty} / \ ell_0 $惩罚的新的稀疏群集框架来克服这些缺点。首先,我们为高维数据群集引入有关最佳分区和噪声特征的新概念。问题,基于这些问题原则上可以直观地解释。然后,我们采用建议的$ \ ell _ {\ infty} / \ ell_0 $框架来制定新的稀疏k-means模型,并采用$ \ ell _ {\ infty} / \ ell_0 $罚金(简称$ \ ell_0 $ -k-means )。我们提出了一种有效的迭代算法来求解$ \ ell_0 $ -k-means。为了深入了解$ \ ell_0 $ -k-means的行为,我们证明了当从高维高斯混合模型生成数据矩阵时,$ \ ell_0 $ -k-means算法产生的解决方案具有特征选择一致性。最后,我们提供了包含合成数据和AllenDeveloping小鼠脑图集数据的实验,以支持与先前已知的$ \ ell_2 / \ ell_1 $稀疏k均值相比,拟议的$ \ ell_0 $ -k-means具有更好的噪声特征检测能力。罚款(简称$ \ ell_1 $ -k-均值)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号